Skip to content

Return candidates from all data sources on id search#6184

Merged
snejus merged 7 commits intomasterfrom
return-candidates-from-all-data-sources-on-id-search
Mar 10, 2026
Merged

Return candidates from all data sources on id search#6184
snejus merged 7 commits intomasterfrom
return-candidates-from-all-data-sources-on-id-search

Conversation

@snejus
Copy link
Member

@snejus snejus commented Nov 23, 2025

Closes #6178 (multiple metadata source results per ID) and #6181 (duplicate/overwrite of candidates).

I am refactoring a couple of other things in beets.autotag.match module because this thing is a hot mess.

Copilot AI review requested due to automatic review settings November 23, 2025 14:24
@snejus snejus requested review from a team and semohr as code owners November 23, 2025 14:24
@github-actions
Copy link

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The Candidates type alias is defined as dict[Info.Identifier, AnyMatch] but then used as Candidates[AlbumMatch]/Candidates[TrackMatch], which isn’t a parametrizable generic; consider either making Candidates a TypeAlias with two type parameters (key/value) or annotating the dicts directly to avoid confusing/misleading typing.
  • Moving the album_matched event emission into AlbumMatch.__post_init__ makes constructing AlbumMatch objects have side effects everywhere; consider using a factory/helper (or an explicit method) to emit the event so that simple instantiation stays side-effect-free and easier to reason about.
  • In _add_candidate, the duplicate check mixes info.album_id and info.identifier while the candidates dict is keyed by identifier; simplifying this to only use identifier for both the truthiness check and the lookup would make the intent clearer and avoid relying on album_id being non-empty.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `Candidates` type alias is defined as `dict[Info.Identifier, AnyMatch]` but then used as `Candidates[AlbumMatch]`/`Candidates[TrackMatch]`, which isn’t a parametrizable generic; consider either making `Candidates` a `TypeAlias` with two type parameters (key/value) or annotating the dicts directly to avoid confusing/misleading typing.
- Moving the `album_matched` event emission into `AlbumMatch.__post_init__` makes constructing `AlbumMatch` objects have side effects everywhere; consider using a factory/helper (or an explicit method) to emit the event so that simple instantiation stays side-effect-free and easier to reason about.
- In `_add_candidate`, the duplicate check mixes `info.album_id` and `info.identifier` while the `candidates` dict is keyed by `identifier`; simplifying this to only use `identifier` for both the truthiness check and the lookup would make the intent clearer and avoid relying on `album_id` being non-empty.

## Individual Comments

### Comment 1
<location> `beets/autotag/match.py:203-204` </location>
<code_context>
         return

     # Prevent duplicates.
-    if info.album_id and info.album_id in results:
+    if info.album_id and info.identifier in results:
         log.debug("Duplicate.")
         return
</code_context>

<issue_to_address>
**issue (bug_risk):** Duplicate-prevention now checks album_id but keys are identifier tuples, so it will never filter duplicates.

Since results is keyed by info.identifier (data_source, id), this condition should be based solely on identifier. The album_id guard is now misleading and may skip intended deduping. Consider removing the album_id check and using only `if info.identifier in results:` (or otherwise aligning the condition with how keys are stored).
</issue_to_address>

### Comment 2
<location> `beets/metadata_plugins.py:58-62` </location>
<code_context>
-    A single ID can yield just a single track, so we return the first match.
-    """
+@notify_info_yielded("trackinfo_received")
+def tracks_for_ids(_id: str) -> Iterable[TrackInfo]:
+    """Return matching albums from all metadata sources for the given ID."""
     for plugin in find_metadata_source_plugins():
-        if info := plugin.track_for_id(_id):
</code_context>

<issue_to_address>
**nitpick (typo):** Docstring for tracks_for_ids mentions albums instead of tracks.

The description looks copied from `albums_for_ids` and should say "tracks" instead of "albums" to match the function’s purpose and avoid confusing metadata source plugin implementors.

```suggestion
@notify_info_yielded("trackinfo_received")
def tracks_for_ids(_id: str) -> Iterable[TrackInfo]:
    """Return matching tracks from all metadata sources for the given ID."""
    for plugin in find_metadata_source_plugins():
        yield from plugin.tracks_for_ids([_id])
```
</issue_to_address>

### Comment 3
<location> `beets/autotag/match.py:284-294` </location>
<code_context>
        if candidates and not config["import"]["timid"]:
            # If we have a very good MBID match, return immediately.
            # Otherwise, this match will compete against metadata-based
            # matches.
            if rec == Recommendation.strong:
                log.debug("ID match.")
                return (
                    cur_artist,
                    cur_album,
                    Proposal(list(candidates.values()), rec),
                )

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/merge-nested-ifs))

```suggestion
        if candidates and not config["import"]["timid"] and rec == Recommendation.strong:
            log.debug("ID match.")
            return (
                cur_artist,
                cur_album,
                Proposal(list(candidates.values()), rec),
            )

```

<br/><details><summary>Explanation</summary>Too much nesting can make code difficult to understand, and this is especially
true in Python, where there are no brackets to help out with the delineation of
different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the autotag matching system to support returning candidates from multiple metadata sources when searching by ID, and fixes an issue where candidates with duplicate IDs from different sources would overwrite each other.

  • Changes metadata plugin API from album_for_id/track_for_id (returning single results) to albums_for_ids/tracks_for_ids (yielding multiple results from all sources)
  • Uses composite Info.identifier (tuple of data_source and id) as candidate dictionary keys to prevent cross-source ID collisions
  • Converts AlbumMatch and TrackMatch from NamedTuples to dataclasses and moves album_matched event emission to AlbumMatch.__post_init__ to deduplicate event firing

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/test_autotag.py Removes assignment tests (moved to new test file) and unused import
test/autotag/test_match.py New test file containing moved assignment tests plus new tests for multi-source ID matching scenarios
beets/metadata_plugins.py Replaces single-result album_for_id/track_for_id functions with multi-result albums_for_ids/tracks_for_ids generators; updates base class method signatures to properly filter None values
beets/autotag/match.py Simplifies match_by_id, updates candidate dictionary to use composite identifiers, removes manual album_matched event calls (now in dataclass), removes unused plugins import
beets/autotag/hooks.py Adds Info.identifier property, converts Match classes to dataclasses with __post_init__ for event emission

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch from 95fecc5 to c8c62b3 Compare November 23, 2025 14:28
@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch 6 times, most recently from a4109de to 7282ede Compare December 3, 2025 02:17
@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch from 7282ede to e89d97d Compare December 5, 2025 08:37
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 80.23256% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.52%. Comparing base (44dc3cd) to head (35361a6).
⚠️ Report is 8 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beets/autotag/match.py 77.77% 5 Missing and 3 partials ⚠️
beets/metadata_plugins.py 79.31% 4 Missing and 2 partials ⚠️
beetsplug/missing.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6184      +/-   ##
==========================================
+ Coverage   69.42%   69.52%   +0.09%     
==========================================
  Files         141      141              
  Lines       18452    18475      +23     
  Branches     3020     3020              
==========================================
+ Hits        12811    12844      +33     
+ Misses       5004     4997       -7     
+ Partials      637      634       -3     
Files with missing lines Coverage Δ
beets/autotag/hooks.py 99.30% <100.00%> (+0.08%) ⬆️
beetsplug/mbsync.py 82.05% <100.00%> (+0.23%) ⬆️
beetsplug/missing.py 57.83% <0.00%> (-0.71%) ⬇️
beets/metadata_plugins.py 84.17% <79.31%> (-2.35%) ⬇️
beets/autotag/match.py 84.82% <77.77%> (+7.90%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch from e89d97d to 079749c Compare December 26, 2025 19:24
@JOJ0 JOJ0 added the core Pull requests that modify the beets core `beets` label Jan 10, 2026
@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch from 079749c to b73d48a Compare January 12, 2026 17:36
@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch from b73d48a to 835cf4f Compare March 8, 2026 12:36
@snejus snejus requested a review from Copilot March 8, 2026 12:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 20 comments.

@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch 3 times, most recently from 1cf45db to 50fd8cb Compare March 9, 2026 20:14
@snejus snejus requested a review from Copilot March 9, 2026 20:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch from 50fd8cb to 396e906 Compare March 10, 2026 00:35
snejus added 7 commits March 10, 2026 00:55
These functions now accept both an ID and data_source parameter,
enabling plugins like mbsync and missing to retrieve metadata from the
correct source.

Update mbsync and missing plugins to use the restored functions with
explicit data_source parameters. Add data_source validation to prevent
lookups when the source is not specified.

Add get_metadata_source helper function to retrieve plugins by their
data_source name, cached for performance.
@snejus snejus force-pushed the return-candidates-from-all-data-sources-on-id-search branch from 396e906 to 35361a6 Compare March 10, 2026 00:56
@snejus snejus merged commit abd77b3 into master Mar 10, 2026
20 checks passed
@snejus snejus deleted the return-candidates-from-all-data-sources-on-id-search branch March 10, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Pull requests that modify the beets core `beets`

Projects

None yet

4 participants